Three-dimensional modeling from single photographs

ABSTRACT

A method of obtaining a three-dimensional digital model of an artificial object, made up of a plurality of geometric primitives, the artificial object being in a single two-dimensional photograph, the method comprising: using edge detection to define a two-dimensional outline of the artificial object within the photograph; interactively allowing a user to define two-dimensional profiles of successive ones of the geometric primitives; interactively allowing a user to sweep respective profiles over an extent of a corresponding one of the geometric primitives within the image; generating successive three-dimensional model parts from existing detected edges of the corresponding geometric primitives and the sweeping of the respective profile; and aligning the plurality of three-dimensional model parts to form the three-dimensional model.

RELATED APPLICATION

This application is a continuation of U.S. Patent Application No.14/177,359 filed on Feb. 11, 2014, which claims the benefit of priorityof U.S. Provisional Patent Application No. 61/763,005 filed on Feb. 11,2013. The contents of the above applications are all incorporated byreference as if fully set forth herein in their entirety.

FIELD AND BACKGROUND OF THE INVENTION

The present invention, in some embodiments thereof, relates tothree-dimensional modeling from single photographs and, moreparticularly but not exclusively to modeling of manmade objects withstraightforward geometry.

The creation and modeling of 3D objects has always been a difficult taskeven for professionals. First, a mental idea of what the model shouldlook like needs to be formed. This conceptual stage requires creativityand inspiration. Then, the idea needs to be implemented by a series ofactions using various geometric modeling tools. These steps take time,demand very high proficiency, and a fair amount of skill. By modelingobjects from existing photographs one can first alleviate the mentalstage. Second, it allows much simpler modeling that can also borrowtextures from the image. This forms at least an initial base model thatcan later be edited and refined. In addition, such abilities can beutilized for manipulating the images themselves using 3D. An example ofa suitable object is shown in FIG. 1, in which the left image shows theobject, and the right image shows the object after minor rotation. Theresult is to leave a black hole in the image.

Extracting three dimensional models from a single photo is still a longway from realization at the current state of technology, as it involvesnumerous complex tasks: the target object must be separated from itsbackground, and its 3D pose, shape and structure should be recognizedfrom its projection. These tasks are difficult since they require somedegree of semantically understanding the object. To alleviate thisproblem, complex 3D models can be partitioned into simpler parts, butidentifying object parts also requires semantic understanding and isdifficult to perform automatically. Moreover, once decomposing a 3Dshape to parts, the relations between these parts should also beunderstood and maintained in the final composition.

Related Work

3D Modeling from a single photo. Images have always been an importantresource and were used as references in 3D modeling. There are numeroustechniques that model shapes from multiple images [26, 28]. However,modeling from a single photograph is more challenging since there ismore ambiguity in the observed geometry. Methods to reconstruct anobject from a single image usually require some degree of manualintervention. Oh et al. [23] allow the annotation of depth and layerinformation in a single image and yield impressive image editing at thescene level. Russell et al. [25] build a manually annotated database of3D scenes to assist recovering scene-level geometry and camera pose. Lauet al. [19] introduced a “Modeling-in-context” concept, allowingcomplementary objects of a photograph to fit better to other objects inthe photo. Jiang et al. [15] recover an architectural model heavilyrelying on the symmetry of such buildings.

Of particular significance is the work of Xu et al. [30] which models aman-made object observed in a single photograph. Their method relies onmatching and warping an existing 3D object to the observed object in thephotograph. The warp is constrained by semantic geometric (geo-semantic)constraints. However, the success of their method strongly depends onthe existence, and retrieval, of a similar 3D shape.

The task of 3D modeling from a single image is closely related to theendeavor of reconstructing a 3D shape from a sketch [24]. A number ofinteractive systems have been developed for this purpose [13, 14, 16,34, 32]. Free-sketched objects however do not necessarily correspond toreal man-made objects that may appear in photographs, and there remainproblems with modeling such man-made objects, which typically consist ofa composition of primitives with certain inter-relations among thecomponents [9, 21], which the systems aimed at free sketches do notapproach.

Part-based Modeling. Part-based snapping techniques have been used formodeling 3D objects from sketches. Gingold et al. [10] introduce aninterface to generate 3D models from 2D drawings by manually placing 3Dprimitives. Tsang et al. [29] use a guiding image to assist sketch-basedmodeling, the user's input curves can snap to the image and then theuser is provided with suggestions for curve completion from a curvedatabase. Recently, Shtof et al. [27] have modeled 3D objects fromsketches by snapping primitives. In their system, the userdrags-and-drops an entire 3D primitive onto its place. Since the fittingproblem is ambiguous, the silhouettes of the sketches must besemantically labeled, and the sketch is expected to contain some cuesthat indicate the part boundaries.

Sweep-based Modeling. Sweep based models have been studied extensivelyin Computer-Aided Design. Choi and Lee [7] model sweep surfaces by usingcoordinate transformations and blending. Swirling-Sweepers [1] is avolume preserving modeling technique capable of unlimited stretching,avoiding self-intersection. Hyun et al. [12] and Yoon et al. [33] usesweeping for human and freeform deformation, respectively. Many CADworks also aim at modeling generalized primitives. Kim et al. [17] modeland animate generalized cylinders by a translational sweep along thespine or rotational sweep around the spine. Lee [20] models generalizedcylinders using direction map representation. Based on generalizedcylinder, Murugappan et al. [22] propose an interesting interactionapproach to create 3D shapes by hand gestures. None of these methodshave been applied for modeling from photographs or sketches.

Semantic Constraints. Gal et al. [9] have introduced a 3D deformationmethod while preserving some semantic constraints among the object'sparts. Such geo-semantic constraints [35] have been shown to be usefulto quickly edit or deform man-made models [30, 31]. Li et al [21] andShtuf et al. [27] reconstruct 3D shapes while simultaneously inferringthe global mutual geo-semantic relations among their parts.

Object-Level Image Editing. Unlike traditional image-based editing,object-based editing allows high-level operations. Operating on theobject-level requires extensive user interaction [8, 5] or massive datacollection [18, 11]. Barrett et al. [4] use wrapping to achieveobject-based editing, which is restricted to 3D rotation. Zhou et al.[37] fit a semantic model of a human to an image, allowing anobject-based manipulation of a human figure in photographs. Recently,Zheng et al. [36] have proposed using cuboid proxies for semantic imageediting. Man-made objects are modeled by a set of cuboid proxies,possibly together with some geometric relations or constraints, allowingtheir manipulation in the photo.

SUMMARY OF THE INVENTION

The present embodiments provide a method and apparatus for extractingthree-dimensional information of objects in single photographs byproviding a user with interactivity to draw a cross-section for a partof the object and then sweep the cross section over the part of theobject to which it applies. Unlike certain of the above cited works, thepresent embodiments may focus on the modeling of a single subject thatis observed in a photograph and not the whole scene.

The computer then fits the cross-section to the object outline of whichit is aware and once all parts of the object have been addressed in thisway the computer is able to generate a three-dimensional model of theobject, which can then be rotated, or used in animations or in any otherway.

Thus, in the present embodiments, the original object is not restricted,as with Xu et al, to prestored shapes. Rather, the embodiments work ongeometric primitives, so that any shape that can be deconstructed intogeometric primitives can be reconstructed into a 3D object. Thereconstructed object is thus composed of these generic primitives,providing larger scope and flexibility.

The prior art teaches snapping, and separately teaches sweeping. Thepresent embodiments combine sweeping and snapping to provide automaticalignment of the primitives into an overall object.

According to an aspect of some embodiments of the present inventionthere is provided a method of obtaining a three-dimensional digitalmodel of an artificial object made up of a plurality of geometricprimitives, the artificial object being in a single two-dimensionalphotograph, the method comprising:

defining a two-dimensional outline of the artificial object within thephotograph;

interactively allowing a user to define cross-sectional profiles ofsuccessive ones of the geometric primitives, the cross-sectionalprofiles defining a third dimension;

interactively allowing a user to provide sweep input to sweep respectivedefined cross-sectional profiles over an extent of a corresponding oneof the geometric primitives within the image, the sweeping generatingsuccessive three-dimensional model primitives from existing detectededges of the corresponding geometric primitives and the sweeping of therespective profile; and

aligning the plurality of three-dimensional model primitives to form thethree-dimensional model.

The method may comprise interactively allowing the user to explicitlydefine three dimensions of the geometric primitive using three sweepmotions, wherein a first two of the three sweeps define a first andsecond dimension of the cross-sectional profile and a third sweepdefines a main axis of the geometric primitive.

The method may comprise, upon the user sweeping the two-dimensionalprofile over a respective one of the geometric primitives, dynamicallyadjusting the two-dimensional profile using a pictorial context on thephotograph and automatically snapping photograph lines to the profile.

In an embodiment, the snapping allows the three-dimensional model toinclude three-dimensional primitives that adhere to the object in thephotographs, while maintaining global constraints between the pluralityof three-dimensional model primitives composing the object.

The method may comprise optimizing the global constraints while takinginto account the snapping and the sweep input.

The method may comprise a post snapping fit improvement of betterfitting the primitive to the image, the better fitting comprisingsearching for transformations within ±10% of primitive size, that createa better fit of the primitive's projection to the profile.

In an embodiment, the defining the two dimensional outline comprisesedge detecting.

An embodiment may comprise estimating a field of view angle from whichthe photograph was taken in order to estimate and compensate fordistortion of the primitives within the photograph.

An embodiment may comprise using relationships between the primitives inorder to define global constraints for the object.

An embodiment may comprise obtaining geo-semantic relations between theprimitives to define the three-dimensional digital model, and encodingthe relations as part of the model.

An embodiment may comprise inserting the three-dimensional digital modelinto a second photograph.

The method may comprise extracting a texture from the photograph andapplying the texture to sides of the three-dimensional model not visiblein the photograph.

In an embodiment, the defining the cross-sectional profiles comprisesdefining a shape and then distorting the shape to correspond to athree-dimensional orientation angle.

The method may comprise applying different constraints to differentparts respectively of a given one of the geometric primitives, orlocally modifying different parts respectively of a given one of thegeometric primitives.

The method may comprise snapping the first two user sweep motions to thephotograph lines, using the endpoints of the first two user sweepmotions along with an anchor point on a respective primitive to createthree-dimensional orthogonal system for a respective primitive.

The method may comprise supporting a constraint, the constraint beingone member of the group consisting of: parallelism, orthogonality,collinear axis endpoints, overlapping axis endpoints, coplanar axisendpoints and coplanar axes, and for the member testing whether a pairof components is close to satisfying the member, and if the member issatisfied or close to satisfied then adding the constraint to arespective one of the primitives.

In the method, aligning the three dimensional primitives may comprisefinding an initial position for all primitives together by changing onlytheir depth to adhere to geo-semantic constraints, followed by modifyingshapes shape of the primitives.

The present embodiments may include a user interface for carrying outthe above method. The user interface may comprise an outline view of acurrent photograph on which view to carry out interactive sweeping todefine cross sections of respective primitives and on which to snap thecross-sections. The user interface may further comprise a solid modelview and a texture view respectively of the current photograph, andselectability for user selection between different basic cross-sectionalshapes.

According to a second aspect of the present invention there may beprovided a method of digitally forming a three-dimensional geometricprimitive from a two-dimensional geometric primitive from atwo-dimensional photograph, comprising:

interactively obtaining user input to draw a two-dimensional crosssection of the primitive and then using further user input to sweep thecross-section over a length of the primitive.

A geometric primitive is a part of an object whose cross section doesnot change, or which does not change discontinuously. That is to say thepart is a geometric primitive if it has a cross section that remainsconstant or changes continuously along the length of the part.

According to a third aspect of the present invention there is provided amethod of forming a derivation of a photograph, the photographincorporating a two dimensional representation of a three-dimensionalobject, the two-dimensional representation being a rotation of anoriginal two-dimensional representation, the rotation being formed by:

carrying out the method described hereinabove to form athree-dimensional model of the original two-dimensional representation;

rotating the three-dimensional model; and projecting the rotatedthree-dimensional model onto a two-dimensional surface to form thederivation.

Unless otherwise defined, all technical and/or scientific terms usedherein have the same meaning as commonly understood by one of ordinaryskill in the art to which the invention pertains. Although methods andmaterials similar or equivalent to those described herein can be used inthe practice or testing of embodiments of the invention, exemplarymethods and/or materials are described below. In case of conflict, thepatent specification, including definitions, will control. In addition,the materials, methods, and examples are illustrative only and are notintended to be necessarily limiting.

Implementation of the method and/or system of embodiments of theinvention can involve performing or completing selected tasks manually,automatically, or a combination thereof. Moreover, according to actualinstrumentation and equipment of embodiments of the method and/or systemof the invention, several selected tasks could be implemented byhardware, by software or by firmware or by a combination thereof usingan operating system.

For example, hardware for performing selected tasks according toembodiments of the invention could be implemented as a chip or acircuit. As software, selected tasks according to embodiments of theinvention could be implemented as a plurality of software instructionsbeing executed by a computer using any suitable operating system. In anexemplary embodiment of the invention, one or more tasks according toexemplary embodiments of method and/or system as described herein areperformed by a data processor, such as a computing platform forexecuting a plurality of instructions. The data processor may include avolatile memory for storing instructions and/or data and/or anon-volatile storage, for example, a magnetic hard-disk, flash memoryand/or removable media, for storing instructions and/or data. A networkconnection may be provided and a display and/or a user input device suchas a keyboard or mouse may be available as necessary.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

The patent or application file contains at least one drawing executed incolor. Copies of this patent or patent application publication withcolor drawings will be provided by the Office upon request and paymentof the necessary fee.

Some embodiments of the invention are herein described, by way ofexample only, with reference to the accompanying drawings. With specificreference now to the drawings in detail, it is stressed that theparticulars shown are by way of example and for purposes of illustrativediscussion of embodiments of the invention. In this regard, thedescription taken with the drawings makes apparent to those skilled inthe art how embodiments of the invention may be practiced.

In the drawings:

FIG. 1A is a simplified flow chart illustrating a procedure for forminga 3D model from a single 2D photograph according to an embodiment of thepresent invention;

FIG. 1B is a simplified diagram showing an object being extracted from a2D photograph for modeling;

FIGS. 2A-2F schematically illustrate the various stages of extractingthe object, modeling, and reinserting a rotated version of the imageback into the original photograph, according to embodiments of thepresent invention;

FIGS. 3A-3E are simplified diagrams illustrating drawing atwo-dimensional profile of a primitive and sweeping the profile over acurved axis of the primitive, the profile snapping to the successivelyshrinking edges of the primitive, according to embodiments of thepresent invention;

FIG. 4 is a simplified diagram illustrating a series of graphicprimitives and their representation as a series of three sweepsrespectively, according to embodiments of the present invention;

FIGS. 5A-5B are simplified diagrams illustrating alignment of differentprimitives based on axis points, according to embodiments of the presentinvention;

FIG. 6 is a simplified diagram illustrating the use of sweeps forrepresentation of cubes according to embodiments of the presentinvention;

FIGS. 7A-7E are a series of photographs in a top row, from which objectsare extracted, modeled and manipulated in a second row and then replacedin the original photograph in the third row according to embodiments ofthe present invention;

FIGS. 8A-8D illustrate how parts can be taken from different images todeal with lack of detail or occlusion of parts in one or other of theimages according to embodiments of the present invention;

FIG. 9 shows four series of three images, in each of which a detail froman original object is replicated according to embodiments of the presentinvention;

FIG. 10 shows two series of photographs in which an object in theleftmost image in each series is modified in different ways according toembodiments of the present invention;

FIG. 11 is a collage made up of objects from individual photographs, thecollage generated according to embodiments of the present invention; and

FIG. 12 is a simplified diagram showing the generation of 3D modelsaccording to embodiments of the present invention from originatingsketches.

DESCRIPTION OF SPECIFIC EMBODIMENTS OF THE INVENTION

The present invention, in some embodiments thereof, relates tothree-dimensional modeling based on a single photograph.

The present embodiments may provide an interactive technique formodeling 3D objects having a clear geometry, typically but notexclusively man-made objects, by extracting them from a singlephotograph. The modeling of a 3D shape from a single photograph requiresthe understanding of the components of the shape, their projections, andrelations. These are particularly difficult for automatic algorithms butare simple cognitive tasks for humans. The present interactive methodmay intelligently combine the cognitive ability of humans with thecomputational accuracy of the machine. To extract an object from a givenphotograph, the user draws cross-sectional profiles of parts of theobject and sweeps the profile over the part using simple gestures, toprogressively define a 3D body that snaps to the shape outline in thephoto. The generated part adheres to various geo-semantic constraintsimposed by the global 3D structure. As explained below, with the presentintelligent interactive modeling tool, the daunting task of objectextraction is made simple. Once the 3D object is extracted, it can bequickly edited and placed back into photos or 3D scenes, offeringobject-driven photo editing tasks which are impossible to achieve inimage-space.

More particularly, the present disclosure teaches an interactivetechnique to model 3D man-made objects from a single photographutilizing the interplay between humans and computers, while leveragingthe strengths of both. The human is involved in perceptual tasks such asrecognition, positioning, and partitioning, while the computer performstasks which are computationally intensive or require accuracy. Guided bythe present method, the final model of the object includes its geometryand structure, as well as some of its semantics. This allows theextracted model to be readily available for intelligent editing, whilemaintaining the shape's semantics.

The present approach is based on the observation that many man-madeobjects can be decomposed into simpler parts that can be represented bya generalized cylinder or similar primitives. An idea of the presentmethod is to provide the user with an interactive tool to guide thecreation of 3D editable primitives. The tool is based on a relativelysimple modeling gesture referred to herein as sweep-snap. The sweep-snapgesture allows the user to explicitly define the three dimensions of theprimitive using three sweeps. The first two sweeps define the first andsecond dimension of a 2D profile and the third, longer, sweep is used todefine the main curved axis of the primitive.

While the user sweeps the primitive, the computer program dynamicallyadjusts the progressive profile by sensing the pictorial context on thephotograph and automatically snapping to it. With such sweep-snapoperations the user models 3D parts that adhere to the object in thephotographs, while the computer automatically maintains globalconstraints with other primitives composing the object. The presentembodiments use geo-semantic constraints that define the semantic andgeometric relations between the primitive parts of the final 3D modelsuch as parallelism and collinearity.

The present method thus disambiguates the three dimensional problem byan explicit sweep move of a 2D entity. The present embodiments adopt ageo-semantic constraint inference to assist the modeling of man-madeobjects. Thanks to the presently disclosed user interaction, the presentembodiments may be able to achieve faster modeling than the prior artsystems listed above and can support fuzzy and noisy image edges as wellas clear sketches and photographs. The present embodiments obviate anyrequirement for sketch classification and avoid the annoyance offalse-positives when geo-semantic optimization falls into a localminimum.

As mentioned above, Zheng et al. [36] proposed using cuboid proxies forsemantic image editing. Man-made objects are modeled by a set of cuboidproxies, possibly together with some geometric relations or constraints,allowing their manipulation in the photo. The method of the presentembodiments achieves similar image manipulations with a larger varietyand more complex man-made models with more kinds of geo-semanticconstraints. The present embodiments may also recover a full 3D model ofthe object rather than just a proxy, and support various shapes ratherthan just cuboids. Using the user interaction the present embodimentsavoid the need for unreliable image segmentation and unsupervised modelfitting. In the present embodiments, the user may provide vitalinformation in the modeling process with little effort.

Using sweep-snap technology, non-professionals can extract various 3Dobjects from photographs. These objects may then be used to build a 3Dscene or to alter the image itself by manipulating or editing theobjects or its parts in 3D, and pasting them back into the photograph.The present disclosure contains results of a variety of such examples.

Before explaining at least one embodiment of the invention in detail, itis to be understood that the invention is not necessarily limited in itsapplication to the details of construction and the arrangement of thecomponents and/or methods set forth in the following description and/orillustrated in the drawings and/or the Examples. The invention iscapable of other embodiments or of being practiced or carried out invarious ways.

Referring now to the drawings, FIG. 1A is a simplified diagramillustrating a method of extracting a three-dimensional model from asingle two-dimensional photograph according to a first embodiment of thepresent invention.

The object in the photograph is typically made up of several geometricparts, and needs to be extracted from the single two-dimensionalphotograph. Edge detection may be used to determine the bounds of theobject from the photograph. A typical object is that shown in FIG. 1B.Edge detection may thus define a two-dimensional outline of saidartificial object within the photograph. The method then provides theuser with an interface and interactively allows the user to definetwo-dimensional profiles of successive ones of said geometric parts. Theprofile may be drawn by the user or obtained from a library and itsextent defined by sweep motions, the two short sweeps mentioned herein.The interface then interactively allows the user to sweep the profilesover the relevant geometric part within the image. This is the longsweep, which shows the computer where the 2D profile goes. The sweep issnapped to the appropriate 2D outline.

The method then generates three-dimensional model parts from existingdetected edges of the corresponding geometric parts and the sweeping ofthe respective profile. The method then aligns the three-dimensionalmodel parts in 3D space to form a consistent three-dimensional model.This alignment is a further snap stage.

FIG. 1B illustrates a man-made object of some complexity, but which isin fact made up of easily identifiable parts of fairly simple geometry,to which the procedure of FIG. 1A may be applied. Thus mostmachine-based programs would be hard pressed to identify a single objectand would certainly have difficulty working out how the object extendsin the third dimension, but a human would readily recognize an object ofcircular cross-section having a central stem, six branches and ahexagonal base, and each branch and the central stem having cups at theupper end which are aligned. The object is shown from two differentangles.

Reference is now made to FIG. 2, which is an overview of how thesweep-snap technique of FIG. 1A may be applied to the object of FIG. 1B.FIG. 2(a) shows the input image with the object of interest. FIG. 2(b)shows extracted edges of the input object. Note that as well as theactual lines of the object the edge extractor has in fact picked up aline that belongs to the object's shadow.

FIG. 2(c) illustrates drawing a 2D profile of a primitive, a geometricpart of the object having a constant or smoothly changing cross-section.FIG. 2(d) shows sweep-snapping to the 3D model of the primitive. FIG.2(e) illustrates application of a geo-semantic constraint to achieve thefinal model of the object, as will be discussed in greater detail below.FIG. 2(f) illustrates what can be done subsequently with the 3D model.In this case the object has been edited by rotating each arm in adifferent direction.

In more detail, the interactive modeling process takes as input a singlephoto such as shown in FIG. 2(a). The goal is to generate a 3D modelwhose projection exactly matches the object in the image. Using asweep-snap modeling technique the user constructs the whole object inparts. Implicitly, the user decomposes the object into simple parts,which are often semantic. Such decomposition is both easy and intuitivefor users, but provides significant information for reconstructing acoherent 3D man-made object from its projection. The parts are expectedto have typical geometric relations that can be exploited to guide thecomposition of the whole object.

Although the user interacts with the given photo, the actual modelingalgorithm uses an outline image of the object as shown in FIG. 2(b).This image is created by edge detection and merging of continuoussequences of edge points to curves, as illustrated by the differentcolors in the figure.

To create one part, the user interactively fits a 3D primitive into thegiven photo. This operation is not trivial since the photo lacks thethird dimension and fitting can be ambiguous. The challenge is toprovide the interactive means to disambiguate such fitting. Thesweep-snap technique of the present embodiments requires the user togenerate a 3D model that roughly approximates the target part, and snapsto the extracted outline of the object.

The user thus defines the 3D approximate part by first drawing a 2Dprofile of the part and then its main axis. The former is done bydrawing a 3D rectangle or circle directly over the image, while thelatter is done by sweeping the profile along a straight or curved axisto form the 3D part. Defining the profile as well as the sweepingoperation are simple tasks since they do not demand accuracy. Theprofile dimensions are guided by the object's left and right outlines asshown in FIG. 2(c). While sweeping, the 3D body of the part is alsodefined by snapping to these outlines. Thus, the part can be sketchedquickly and casually by the user. FIG. 2(d) shows the result of sweepingthe profile from (c) along one of the tubes of the object, in this casea menorah-style candelabrum. The sweep-snap operation is discussed ingreater detail below. To compensate for perspective distortion, duringthis process the field of view angle of the camera taking the scene isestimated.

As the modeled parts are being gathered, the geometric relations amongthem serve (i) to assist in disambiguating and defining the depthdimension and (ii) to optimize the positioning of the parts. Thesegeometric relations include parallel, orthogonal, collinear and coplanarparts. Most of these are automatically inferred from the positioning ofthe parts, but the user can also specify the constraints for theselected parts manually. The present embodiments optimize thesegeo-semantic constraints while taking into account the snapping of the3D geometry to the object's outlines and the user's sweeping input. Thecomplete model with geo-semantic relation is shown in FIG. 2(e). Thegeo-semantic relations not only help define the 3D model, but oncecomputed, they remain encoded as part of the 3D representation. Suchrepresentation supports smart (semantic) editing of the 3D model, asdemonstrated in FIG. 2(f) and other figures herein.

Single Primitive Fitting

The main challenge in image-guided modeling of a 3D part, is todisambiguate the observed subject and infer the missing depth dimension.Directly fitting a 3D object into the image requires many geometrichints to constrain the non-linear optimization problem [27]. The presentembodiments explicitly guide the 3D inference with simple userinteraction. The sweep-snap modeling tool consists of two stages. In thefirst, the user draws a 2D profile assisting by explicitly defining itsposition in 3D. In the second, the user sweeps the profile to implicitlydefine a volumetric part.

Sweep-snap relies on snapping of primitives to object outlines createdfrom image edges. To extract the image edges and build candidate objectoutlines the present embodiments adopt a method for hierarchical edgefeature extraction based on spectral clustering [2]. Then, a techniqueis applied to link the detected edge pixels into continuous pointsequences [6], each shown in different color in FIG. 2(b) and FIG. 3(a).To each detected edge pixel, the process associates an edge orientationcomputed in its 5×5 neighborhood. In the following, we first describethe sweep-snap technique for generalized cylinders and then briefly showhow it extends to the simpler case of the generalized cuboid.

Reference is now made to FIGS. 3a-3e , which illustrate the sweep snapprocess of the present embodiments on an exemplary curved coneprimitive. The modeling process of a primitive comprises defining a 2Dprofile and sweeping the profile along the primitive using the main axisof the primitive.

Profile. In a first stage, the user draws the 2D profile of thegeneralized cylinder, usually at one end of the shape. This isillustrated in FIG. 3, where (a) is the input image with detectedoutlines. The task is to draw a 2D profile correctly oriented in 3D.This can be regarded as positioning a disk in 3D by drawing itsprojection in 2D. To simplify this task, we assume that the disk is acircle, thus reducing the number of unknown parameters. Later, thecircular disk can be warped into an elliptical one based on the 3Dreconstruction. The drawing of a circular disk is accomplished bydrawing two straight lines over the image, see FIG. 3(b). The first linedefines the major diameter of the disk, and then the second line isdragged to the end of the minor diameter. This forms an ellipse in imagespace that matches the projection of a circular disk, see FIG. 3(c). Thedepth value of the disk is set to 0. The normal direction and radius ofthe disk are assigned according to the length and orientation of the twodiameters of the elliptical projection.

Sweeping. Once the base profile is ready, in the second stage, the usersweeps it along a curve that approximates the main axis of the 3D part.In general, this curve should be perpendicular to the profile of the 3Dprimitive, as indicted by blue arrows in FIG. 3(c). As the curve isdrawn, copies of the profile are placed along the curve, and each ofthem is snapped to the object's outline.

During drawing, the axis curve is sampled in image space at uniformintervals of five pixels producing sample points A₀, . . . A_(N). Then,at each sampled point A_(i), a copy of the profile is fit, centeredaround the curve. The normal of the profile is aligned with theorientation of the curve at A_(i), and its diameter is adjusted to meetthe object's outlines. Together, the adjusted copies of the profile forma discrete set of slices along the generalized cylinder, see FIG. 3(e).

At each point A_(i), we first copy the profile from A_(i−1) andtranslate it to A_(i). Then we rotate it to accommodate for the bendingof the curve. Now, we consider the two tips of the profile, denoted byp_(i) ⁰,p_(i) ¹—indicated by yellow points in FIG. 3(d). For eachcontour point p_(i) ^(j), j∈[0,1] we cast a 2D ray from point A, alongthe diameter of the profile, through pZ seeking for an intersection withan image outline.

Finding the correct intersection of the ray with an image outline issomewhat challenging. The image may contain many edges in the vicinityof the new profile. The closest one is not necessarily the correct one,e.g. when hitting occlusion edges. In other cases, the correct edges maybe missing altogether. To deal with these we first limit the search foran intersection to a fixed interval—the size of which is governed bylimiting the diameter change of adjacent profiles not to exceed 20% ofthe length. Second, we search for an intersecting outline that is closeto perpendicular to the ray. If the angle between the ray and theoutline is larger than π/3 the candidate intersection is discarded.

When an intersection is found the contour point p_(i) ^(j) position issnapped to the intersection position. If both contour points of theprofile are snapped, one may adjust the location of A_(i) to lie intheir midpoint. If only one side is successfully snapped, the length ofthe snapped side may be mirrored to the other side and the other contourpoint may be moved respectively. Lastly, if none of the two contourpoints is snapped, the size of the previous profile is maintained.Reference is now made to FIG. 4, which shows a series of geometricprimitives that may be included in embodiments of the present invention.The arrows indicate a three-stroke paradigm that can be used with eachprimitive to indicate translations of the basic primitive.

Numerous primitives can be used. Generalized cuboids are modeled in asimilar manner as generalized cylinders. The main difference lies in thefirst stage of modeling the profile. The two strokes that define theprofile of a cuboid follow the two edges of the cuboid base instead ofthe diameters of the disk, as shown in the bottom row of FIG. 4 by thered and green lines. Simpler primitives such as spheroids or simplecubes are also supported by direct modeling in the present embodiments.

The above modeling steps follow user gestures closely, especially whenmodeling the profile. This provides more intelligent understanding ofthe shape but is less accurate. Therefore, after modeling eachprimitive, we apply a post-snapping stage to better fit the primitive tothe image as well as correct the view. We search for smalltransformations (±10% of primitive size) that create a better fit of theprimitive's projection to the edge curves that were snapped in theediting process. We also automatically refine the field of view angles(initialized to 45 degree) after each modeling step for better fitting.

In many cases, the modeled object has some special properties, orpriors, that can be used to constrain the modeling. For example, if weknow that a given part has a straight spine, we can constrain the sweepto progress along a straight line. Similarly, we can constrain the sweepto preserve a constant or linearly changing profile radius. In thiscase, the detected radii are averaged or fitted to a line along thesweep. We can also constrain the profile to be a square or a circle. Infact, a single primitive can contain segments with differentconstraints: it can start with a straight axis and then bend, or use aconstant radius only in a specific part. These constraints are extremelyhelpful when the edge detection results are bad. Lastly, we provide thepossibility to interactively adjust the profile diameter locally, forinstances, in places where the outlines were not salient or missingaltogether.

To further ease the modeling interaction, the present embodiments mayalso provide a copy and paste tool. The user can drag a selected partthat is already snapped over to a new location in the image and snap itagain in the new position. While copying, the user can rotate, scale, orflip the part.

Inter-part Optimization

The technique described above generates parts that fit the objectoutlines. The positions of these parts in 3D are still ambiguous andinaccurate. However, as these parts are components of a coherentman-made object, they have certain geometric relations among themderived from the semantics of the object. Constraining the shape basedon such geo-semantic inter-parts relations allows modeling coherentshapes [9, 35, 21, 27].

A direct global optimization of the positioning of parts that considerstheir geo-semantic relations is computationally intensive and subject tofall into local minima, since each component has many degrees offreedom. In the present setting, however, the modeled components arealso constrained to agree with some outlines of the image. This cansignificantly reduce the degrees of freedom of the parts. By consideringthe image constraints, the dimensionality of the optimization space canbe lowered and local minima are avoided. In the following, we describehow we simplify the general problem and solve a rather light-scaleoptimization to respect the geo-semantic constraint among thesweep-snapped parts.

The key idea is that by fixing the projection of a part, its positionand orientation can be determined by one or two depth values only. Wefirst describe the method for simple parts that can be modeled by asingle parameter, namely parts which were modeled along a straight axis.General cylinders and cuboids with curved axes will later beapproximated by two arbitrary-connected straight axis primitives at thestart and end of the shape.

Reference is made to FIG. 5, which is a simplified diagram showing anexample for inferring geo-semantic constraints, based on (a) Parallelismand (b) Collinear axis endpoints. FIG. 5 illustrates concave cylinders.In FIG. 5a the two cylinders have parallel axes. In FIG. 5b thecylinders also have parallel axes but are not aligned next to eachother.

The position and orientation of a straight-axis generalized cylinder a,can be determined by two points we call anchors, C_(i,1) and C_(i,2)along its main axis, as shown for example in FIG. 5. Referring now toFIG. 6, in a similar way, a cuboid part can be represented by sixanchors C_(i,j), j∈[1,6] positioned at the center of each face. Everyopposite pair of anchors defines one main axis of the cuboid. Eventhough four anchors are enough to fix the position and orientation of acuboid, an embodiment uses six anchors to allow setting variousgeo-semantic constraints on this part.

As the user defines the 3D part i using three strokes for the threedimensions, as discussed above in respect of FIG. 1A, we can utilize thestrokes, or sweeps, to define a 3D local orthogonal coordinate systemfor the part. First, we define the origin of the coordinate system at areference point R_(i) on the part's projection. For a cuboid part wepick the point connecting the first and second of the user's strokes andfor a cylinder we pick the point connecting the second and thirdstrokes. Due to the internal orthogonality of the straight part, theprofile of the part is perpendicular to the main axis. Therefore, we mayuse the endpoints of the user's strokes (after snapping them to theimage) to define three points that together with R_(i) create anorthogonal system. These are the orange points and lines in FIG. 6. Notethat this coordinate system is defined in camera coordinates. The x andy values of the end points are determined by the projection and theirdepth values can be found as a function of z_(i), the z value of R_(i),by using three orthogonality constraints equations.

Next, the positions of the anchor points C_(i,j) in world coordinatescan be defined using the orthogonal local axes. This defines thestructure of part i. Since the local axes depend only on the depth valuez_(i) of the point R_(i), we can parameterize the positions of C_(i,j)as a function of z_(i): C_(i,j)=F_(i,j)(z_(i)). That is, the positionand orientation of the whole part become a function of a single unknownz_(i), F_(i,j) has the form

${F_{i,j}( z_{i} )} = \frac{b}{a( {z_{i} + v} )}$

for each coordinate component, where a depends only on the x andy-coordinate of the endpoints of the local axes, and b,v are decided byperspective parameters. They are different for each axis endpoint andfor each coordinate component.

We may use the anchor points to define the geo-semantic relations amongthe parts. Specifically, we support six types of constraints:parallelism, orthogonality, collinear axis endpoints, overlapping axisendpoints, coplanar axis endpoints and coplanar axes. During themodeling phase, for each type, we test whether a pair of components isclose to satisfying one of the above geo-semantic constraints, and ifso, we add the constraint to our system. For example, for two cylinderswith index m and n, if the angle between vector (C_(m,1)−C_(m,2)) and(C_(n,1)−C_(n,2)) is smaller than 15 degree, we may add a parallelconstraint (C_(m,1)−C_(m,2))×(C_(n,1)−C_(n,2))=0 to our system ofconstraints. Similarly if any three among the four anchors of twocylinders form a triangle containing an angle larger than 170 degree,then we add a collinear axes constraints: (C₁−C₂)×(C₁−C₃)=0 as shown inFIG. 5. Internal constraints such as orthogonality and concentricity ofa cuboid axes are also added to the system. Finally, the presentmodeling tool provides ways to manually enforce or revoke a constraintfor selected primitive parts.

FIG. 6 thus illustrates two cubes and shows how the present embodimentsdetermine the coordinates C_(i,j) for the axis endpoints of a cuboidfrom the depth value z_(i) of the reference point R_(i).

Suppose we have defined p geo-semantic constraints G_(k) for a set of ncomponents, together with the objective function of fitting to the imageoutline, we define the following optimization system:

$\begin{matrix}{{{minimize}\mspace{14mu} E} = {\sum\limits_{i = 1}^{n}{w_{i}( {\sum\limits_{j = 1}^{m_{i}}{{C_{i,j} - {F_{i,j}( z_{i} )}}}^{2}} )}}} & (1) \\{{{subject}\mspace{14mu} {to}\mspace{14mu} {G_{k}( {C_{1,1},\ldots \mspace{14mu},C_{n,m_{n}}} )}},{k = 1},\ldots \mspace{14mu},p,} & (2)\end{matrix}$

where m_(i) is the number of axes of ith primitive part. We add weightsw_(i) proportional to the radius of the base profile of each part andthe length of its axis. Larger parts have more impact on the solutionsince typically larger parts are modeled more accurately. Intuitively,the first equation tries to fit the part's geometry (C_(i,j)) to theimage outline and the user's gestures, while the second set of equationdefine the geo-semantic constraints.

Solving for C_(i,j) and z_(i) together we have a non-linear non-convexoptimization problem with non-linear constraints. Such a system is veryhard to solve directly without being trapped in local minima. Hence, wedecompose the solution of this system into a two-step procedure. Thefirst step tries to find a good initial position for all parts togetherby changing only their depth (governed by z_(i)) to adhere to thegeo-semantic constraints. In the second step, the full system issolved—allowing the shape of the parts (C_(i,j)) to change as well.

In the first step, we modify the soft constraint in Equation (1) to ahard one, and replace C_(i,j) by F_(i,j)(z_(i)) in all equations. Thismeans Equation (1) is trivially true and we are left with just theconstraints in Equation (2). In effect, this means we fix the projectionand find the optimal z_(i) fitting the geo-semantic constraints. Thisreduces the number of variables to n (z_(i), 1≦i≦n) and changes Equation(2) into an over-determined system, where each equation only containstwo different variables.

We find the least squares solution z _(i) for example by conjugategradient, with all z_(i) values initialized to 0.

This first step provides a good initial condition to find the optimalsolution for C_(i,j), as it should be around the values F_(i,j)(z _(i)),fixing only small inconsistencies with the geo-semantic constraints.Hence, in the second step, we solve the full optimization of Equation(1) with the set of constraints in Equation (2), for example using anaugmented Lagrangian method. Both steps are fast, and we are able toavoid local minima due to better initialization from the first step.This leads to an interactive rate optimization. Note that thenonlinearity of F_(i,j)O is due to the assumption of a perspectiveprojection. However, we can approximate this projection linearly sincewe assume the change in z_(i) is small. This further increases the speedand stability of our solution.

Lastly, to handle parts with a non-straight axis, we first simplify theproblem by assuming that the general axis lies on a plane. Second, wetreat the part as being a blend of two straight-axis sub-parts, placedat the two ends of the part. The position of each of these sub-parts isdetermined by a single depth value in the optimization above, and thewhole part is defined by connecting the two subparts with a general axiswhile constraining the profile snapping.

The Derivation of F

For a straight primitive with reference point R, We denote the threeorange points in FIG. 6 by P_(m), m∈[1,3], the order doesn't matter.Then we have three equation defined by orthogonality in worldcoordinates: {right arrow over (RP)}_(m)·{right arrow over (RP)}_(n)=0,where the pair (m, n)∈P={(1,2), (2,3), (3,1)}. We denote the worldcoordinates of P_(m) by (X_(m), Y_(m), Z_(m)), screen coordinates by(x_(m), y_(m)), and depth by z_(m). For R, they are (X_(r),Y_(r),Z_(r))etc. So we can write the equations:

(X _(m) −X _(r))(X _(n) −X _(r))+(Y _(m) −Y _(r))(Y _(n) −Y _(r))+(Z_(m) −Z _(r))(Z _(n) −Z _(r))=O,

by inverse perspective transformation, we can change this to:

${{{( {\frac{{Nx}_{m}}{z_{m} + v} - \frac{{Nx}_{r}}{z_{r} + v}} )( {\frac{{Nx}_{n}}{z_{n} + v} - \frac{{Nx}_{r}}{z_{r} + v}} )} + {( {\frac{{Ny}_{m}}{z_{m} + v} - \frac{{Ny}_{r}}{z_{r} + v}} )( {\frac{{Ny}_{n}}{z_{n} + v} - \frac{{Ny}_{r}}{z_{r} + v}} )} + {( {\frac{u}{z_{m} + v} - \frac{u}{z_{r} + v}} )( {\frac{u}{z_{n} + v} - \frac{u}{z_{r} + v}} )}} = 0},$

where N, u, v are constant when the perspective parameters are fixed.Since the projection is fixed, x_(m), y_(m), x_(n), y_(n) are all fixed.The only variables are zs. To solve these equations, we first replaceall zs by z=z+v, By multiplying z _(m),z _(n),z _(r) ² on both side, andrepresenting z _(m) by z _(n), we get:

${{\overset{\_}{z}}_{m} = \frac{{( {{x_{m}x_{n}} + {y_{m}y_{n}} + c^{2}} ){\overset{\_}{z}}_{r}^{2}} - {( {{x_{m}x_{r}} + {y_{m}y_{r}} + c^{2}} ){\overset{\_}{z}}_{r}{\overset{\_}{z}}_{n}}}{{( {{x_{n}x_{r}} + {y_{n}y_{r}} + c^{2}} ){\overset{\_}{\overset{\_}{z}}}_{r}} - {( {x_{r}^{2} + y_{r}^{2} + c^{2}} ){\overset{\_}{z}}_{n}}}},$

where

$c = {\frac{v}{N}.}$

In this representation we replace the two unknown z by the third, andsolve for the third z as a function of z _(r). LetC_(s,t)=(x_(s)x_(t)+y_(s)y_(t)+c²), where (s, t) can be 1,2,3 and r, wedirectly give the representation of z _(m):

${\overset{\_}{z}}_{m} = {{\pm \frac{{C_{r,m}^{2}C_{n,l}} - {C_{r,l}C_{r_{m}}C_{n,m}} - {C_{r,n}C_{r_{m}}C_{l,m}} + {C_{r,r}C_{l,m}C_{n,m}}}{C_{r,r}^{2} - {C_{r,r}C_{r,l}C_{r,n}}}}{{\overset{\_}{z}}_{r}.}}$

Due to symmetry, m, n, l can be any permutation of 1,2,3. Note that thetwo solutions exactly match the ambiguity of perspective projection ofthe primitive. We examine the two solutions and use the one that cangenerate a projection that fits the image edges better. This has theform of z _(m)=az _(r), which means z_(m) is linear with z_(r). We caneasily compute the world coordinates (X_(m), Y_(m), Y_(m)) as a functionof z_(y) by inverse perspective transformation. Since the axis endpointsC_(i,f) are linear combination of P_(m), we can also decide each oftheir coordinates as a function of z_(r) in the form of

$\frac{b}{a( {s_{r} + v} )},$

where b, v are decided by the perspective, and a is decided by the abovederivation.

Experimental Results

The sweep-snap interactive technique referred to herein is currentlyimplemented in C++. The system provides an outline view for sweep-snap,a solid model view and a texture view for checking the model and imageediting. The user can choose between “cuboid”, “cylinder” or “sphere”primitives using a button or key shortcut. The system also providesconventional menu selection, view control and deformation tools. Most ofthe examples given below were modeled in a few minutes or less. Themodeling process is intuitive and fluent so that even an untrained userwith little experience of the technique can handle. Editing andrepositioning the object requires activities which would be familiar tousers of other parametric editing techniques.

Once the objects have been modeled, the user may map the texture fromthe image onto the object, as exemplified in the bottom row in FIG. 7.By projecting a vertex of the mesh to the image plane, one can obtainthe 2D coordinates of the vertex on the image. These are used as texturecoordinates. Alpha matting on the foreground image is computed andmapped as a texture onto the model to eliminate the effect of backgroundpixels. As there is no information regarding the back of the object, wesimply use a symmetry assumption and mirror the front texture content tothe back. At each of the profile layers of the model, one can assign thesame texture coordinate for the two vertexes which are mirroredsymmetrically about the center of the layer. Note that on the two sidesof the object, there may be centro-symmetric pairs that both face awayfrom the camera. To deal with this situation, one may treat the textureassociated with these vertexes as holes, and fill them with an imagecompletion technique [3] from the texture.

Modeling from single image and editing. The approximated 3D model andits texture allow semantic image editing. Before editing, the image ofthe 3D model is cut out from the photo, leaving a black hole (asdemonstrated in FIG. 1) which is filled again using an image completiontechnique [3].

FIG. 2(f), referred to above demonstrates a menorah-style candelabrumwhere each arm is rotated by a different angle. All the candleholdershave the same size, but due to the oblique view they appear at adifferent size in the photo. During modeling, to ensure this effect, onecopies each candleholder and fits each one to the image, while requiringthat they lie on the same plane and that their 3D sizes be the same.This efficiently recovers the true 3D position and shape of each part.

Reference is now made to FIG. 7. A series of man-made objects are shownin photographs in the top row. The second row shows the objects of thefirst row having been modeled according to the present embodiments androtated or otherwise repositioned. The bottom row shows the objectsreinserted into the original photograph following the rotation orrepositioning of the second row.

Thus, in the middle row we show the extracted 3D models, repositionedand, in the third row, inserted back into the photo. The rightmostcolumn shows the modeling and repositioning of three objects in onecomplex photo. Note that the Menorah has been rotated as well astranslated on the ground plane.

Reference is now made to FIG. 8, which shows how, even though thepresent embodiments require only one photograph, nevertheless a model orpart of a model can be extracted from one photograph and subsequentlyinserted into another photograph or integrated into model partsextracted from the other photograph.

Modeling the Obelisk in Paris from two photos as per the above involves(a) Taking the base of the Obelisk from a close view and thus capturingdetail. (b) Transporting the partial 3D model from the close view to amore distant view, where part of the base is occluded, to complete themodeling. (c) the texture of the transported part is blended into theregion it occupied and the whole is rotated. (d) The end result is thatdetails of the base are visible in a close-up view of the model of theObelisk, when in fact most of the obelisk is taken from the distantphotograph.

More particularly, in FIG. 8 we show a case where two input photos areused to model one object: the Obelisk in Paris. First, the base of theObelisk is modeled from a close up view in (a), where more details canbe captured. Then, the partial 3D model is transported to another photowhere the entire Obelisk is visible, but the base is occluded. Similarto a copy and paste procedure, the user positions the extractedbase-part inside the image, and the part snaps to the image contours in(b). The user can then continue the modeling process. The texture of thetransported part is blended to the color of the region it occupies tomaintain consistency, as shown in the rotated view (c). The details ofthe base are still visible in the close up view (d) of the model of theObelisk.

Reference is now made to FIG. 9, which shows three sets of three images.In each set the first image is the original photograph. In the secondimage parts of the object shown in orange are added from the model, torepresent original parts which have been replicated and rotated ordeformed, and in the third image the change is integrated into theoriginal object. Thus in the first set of images, the tap gains twoextra handles. The street light gains two extra lamps. The candelabrumgains two extra holders, and the samovar gains extra handles and knobs.

Thus, FIG. 9 shows four examples of modeling and editing. These examplesshow part-level editing, where some parts of the objects (highlighted ingolden colors) are replicated and copied, possibly rotated to enhanceand enrich the shape. The top left shows modeling a tap, changing itsrotary switch to cruciform and rotating it.

Then, the whole tap is copied and attached to another side of the wall.The bottom left shows a candleholder being modeled and rotated, with itstwo arms duplicated to a perpendicular position. We also enlarge themiddle holder. The top right shows a street lamp with duplicated lampsmoved to a lower position, rotated and copied to other positions in thestreet. The bottom right shows a samovar rotated with multiple copies ofits handles pasted across its surface.

Reference is now made to FIG. 10, which shows a tea pot and a telescopeand five different editing results of the original photograph. Theleftmost images are the source, and the variations applied to the partsare non-linear variations.

FIG. 10 shows a variety of editing possibilities of two objects. Notethe non-uniform scaling applied to the different parts.

In FIG. 11 we show a photograph with a collection of objects that weremodeled and copied from other photos. The modeling and editing time ofeach example is discussed in Table 1, below, as well as the number ofmanually provided geo-semantic constraints. In general, an object inoblique view will need more manual constraints. Most of theseconstraints are coplanar axes, which are ambiguous to automaticinference.

Modeling from sketch. Reference is now made to FIG. 12 which illustrateshow the same model may emerge using sketches as input in place ofphotographs. Input sketches are taken from [27].

Recently, Shtuf et al. [27] presented a method to model objects from 2Dsketches. In FIG. 11 we show examples of modeling several of thesketches they used. Since a sketch is typically inaccurate, in caseswhere the axis of the primitive differs too much from its modelinglocation, we ignore the boundary snapping in our geo-semanticsoptimization. Our modeling time (60 s on average) is significantly lowercompared to the reported time of their technique (130 s on average).

TABLE 1 Modeling and editing times (in seconds) and the number ofmanually provided geo-semantic constraints (either adding or removing)for each example. Figure 2 7 8 9 10 11 Example Menorah (a) (b) (c) (d)(e) Obelisk tap holder lamp samovar Pot telescope trumpet handle hornTime (s) 80 + 25 15 20 35 30 65 + 35 20 30 + 25 45 + 35 40 + 50 50 + 2015 + 30 100 + 30 80 30 60 Constraints 2 0 2 1 1 1 0 2 1 1 1 0 2 1 1 1

The photographs themselves usually have some distortions from idealperspective projection, especially when an object is too close or takenfrom a wide angle camera. In this case, fisheye correction should beapplied first before modeling.

Conclusion

We present an interactive technique to model 3D man-made objects from asingle photograph by combining the cognitive ability of humans with thecomputational accuracy of the machine. The results show that the presentembodiments may model a large variety of man-made objects from naturalimages or photographs, as well as modeling objects from sketches. Themodeled objects may be used to achieve semantic editing and compositionof images, as well as creating simple 3D scenes by copying items fromphotographs. One may extend the types of supported primitives to allowmodeling of free shapes of natural objects. It is also possible to addsymmetry and smoothness constraints on the shapes. Sweep-snap can alsobe extended for modeling from multi-view image or video without the helpof depth data. In terms of applications, we demonstrate editing andmanipulation of geometry and furthermore, the recovered 3D model andsurface norms can be used to achieve re-lighting and material editing.

It is expected that during the life of a patent maturing from thisapplication many relevant pulse shaping and symbol decoding technologieswill be developed and the scope of the corresponding terms in thepresent description are intended to include all such new technologies apriori.

The terms “comprises”, “comprising”, “includes”, “including”, “having”and their conjugates mean “including but not limited to”.

The term “consisting of” means “including and limited to”.

As used herein, the singular form “a”, “an” and “the” include pluralreferences unless the context clearly dictates otherwise.

It is appreciated that certain features of the invention, which are, forclarity, described in the context of separate embodiments, may also beprovided in combination in a single embodiment. Conversely, variousfeatures of the invention, which are, for brevity, described in thecontext of a single embodiment, may also be provided separately or inany suitable subcombination or as suitable in any other describedembodiment of the invention. Certain features described in the contextof various embodiments are not to be considered essential features ofthose embodiments, unless the embodiment is inoperative without thoseelements.

Although the invention has been described in conjunction with specificembodiments thereof, it is evident that many alternatives, modificationsand variations will be apparent to those skilled in the art.Accordingly, it is intended to embrace all such alternatives,modifications and variations that fall within the spirit and broad scopeof the appended claims.

All publications, patents and patent applications mentioned in thisspecification are herein incorporated in their entirety by referenceinto the specification, to the same extent as if each individualpublication, patent or patent application was specifically andindividually indicated to be incorporated herein by reference. Inaddition, citation or identification of any reference in thisapplication shall not be construed as an admission that such referenceis available as prior art to the present invention. To the extent thatsection headings are used, they should not be construed as necessarilylimiting.

REFERENCES

[1] Angelidi, A., Canif, M., Wyvill, G., and King, S. 2004.Swirling-sweepers: Constant-volume modeling. In Computer Graphics andApplications, 2004. PG 2004. Proceedings. 12th Pacific Conference on,IEEE, 10-15.

[2] Arbelaez, P., Maire, M., Fowlkes, C., and Malik, J. 2011. Contourdetection and hierarchical image segmentation. Pattern Analysis andMachine Intelligence, IEEE Transactions on 33, 5,898-916.

[3] Barnes, C., Shechtman, E., Finkelstein, A., and Goldman, D. 2009.Patchmatch: a randomized correspondence algorithm for structural imageediting. ACM Transactions on Graphics-TOG 28, 3, 24.

[4] Barrett, W., and Cheney, A. 2002. Object-based image editing. In ACMTransactions on Graphics (TOG), vol. 21, ACM, 777-784.

[5] Cheng, M., Zhang, F., Mitra, N., Huang, X., and Hu, S. 2010.Repfinder: finding approximately repeated scene elements for imageediting. ACM Transactions on Graphics (TOG) 29, 4,83.

[6] Cheng, M. 2009. Curve structure extraction for cartoon images. InProceedings of The 5th Joint Conference on Harmonious Human MachineEnvironment, 13-25.

[7] Choi, B., and Lee, C. 1990. Sweep surfaces modelling via coordinatetransformation and blending. Computer-Aided Design 22,2,87-96.

[8] Eitz, M., Sorkine, 0., and Alexa, M. 2007. Sketch based imagedeformation. In Proceedings of Vision, Modeling and Visualization (VMV),135-142.

[9] Gal, R., Sorkine, 0., Mitra, N., and Cohen-Or, D. 2009. iwires: ananalyze-and-edit approach to shape manipulation. In ACM Transactions onGraphics (TOG), vol. 28, ACM, 33.

[10] Gingold, Y., Igarashi, T., and Zorin, D. 2009. Structuredannotations for 2d-to-3d modeling. In ACM Transactions on Graphics(TOG), vol. 28, ACM, 148.

[11] Goldberg, C., Chen, T., Zhang, F., Shamir, A., and Hu, S. 2012.Data-driven object manipulation in images. In Computer Graphics Forum,vol. 31, Wiley Online Library, 265-274.

[12] Hyun, D., Yoon, S., Chang, J., Seong, J., Kim, M., and Jailer, B.2005. Sweep-based human deformation. The Visual Computer 21, 8,542-550.

[13] Igarashi, T., Kawachiya, S., Tanaka, H., and Matsuoka, S. 1998.Pegasus: a drawing system for rapid geometric design. In CHI 98conference summary on Human factors in computing systems, ACM, 24-25.

[14] Igarashi, T., Matsuoka, S., and Tanaka, H. 1999. Teddy: a sketchinginterface for 3d freeform design. In Proceedings of the 26th annualconference on Computer graphics and interactive techniques, ACMPress/Addison-Wesley Publishing Co., 409-416.

[15] Jiang, N., Tan, P., and Cheong, L. 2009. Symmetric architecturemodeling with a single image. ACM Transactions on Graphics (TOG) 28,5,113.

[16] Kaplan, M., and Cohen, E. 2006. Producing models from drawings ofcurved surfaces. In EUROGRAPHICS workshop on sketch-based interfaces andmodeling, The Eurographics Association, 51-58.

[17] Kim, M., Park, E., and Lee, H. 1994. Modelling and animation ofgeneralized cylinders with variable radius offset space curves. TheJournal of Visualization and Computer Animation 5, 4,189-207.

[18] Lalonde, J., Hoiem, D., Efros, A., Rother, C., Winn, J., andCriminisi, A. 2007. Photo clip art. In ACM Transactions on Graphics(TOG), vol. 26, ACM, 3.

[19] Lau, M., Saul, G., Mitani, J., and Igarashi, T. 2010.Modeling-in-context: User design of complementary objects with a singlephoto. In Proceedings of the Seventh Sketch-Based Interfaces andModeling Symposium, Eurographics Association, 17-24.

[20] Lee, J. 2005. Modeling generalized cylinders using direction maprepresentation. Computer-Aided Design 37, 8,837-846.

[21] Li, Y., Wu, X., Chrysathou, Y., Sharf, A., Cohen-Or, D., and Mitra,N. 2011. Globfit: Consistently fitting primitives by discovering globalrelations. In ACM Transactions on Graphics (TOG), vol. 30, ACM, 52.

[22] Murugappan, S., Liu, H., Ramani, K., et al. 2012. Shape-it-up: Handgesture based creative expression of 3d shapes using intelligentgeneralized cylinders. Computer-Aided Design.

[23] Oh, B., Chen, M., Dorsey, J., and Durand, F. 2001. Image-basedmodeling and photo editing. In Proceedings of the 28th annual conferenceon Computer graphics and interactive techniques, ACM, 433-442.

[24] Olsen, L., Samavati, F., Sousa, M., and Jorge, J. 2009.Sketch-based modeling: A survey. Computers & Graphics 33, 1, 85-103.

[25] Russell, B., and Torralba, A. 2009. Building a database of 3dscenes from user annotations. In Computer Vision and PatternRecognition, 2009. CVPR 2009. IEEE Conference on, IEEE, 2711-2718.

[26] Seitz, S., Curless, B., Diebel, J., Scharstein, D., and Szeliski,R. 2006. A comparison and evaluation of multi-view stereo reconstructionalgorithms. In Computer Vision and Pattern Recognition, 2006 IEEEComputer Society Conference on, vol. 1, IEEE, 519-528.

[27] Shtuf, A., Agathos, A., Gingold, Y., Shamir, A., and Cohen-Or, D.2013. Geosemantic snapping for sketch-based modeling. In Eurographics.

[28] Snavely, N. 2011. Scene reconstruction and visualization frominternet photo collections: A survey. IPSJ Transactions on ComputerVision and Applications 3, 0,44-66.

[29] Tsang, S., Balakrishnan, R., Singh, K., and Ranjan, A. 2004. Asuggestive interface for image guided 3d sketching. In Proceedings ofthe SIGCHI conference on Human factors in computing systems, ACM,591-598.

[30] Xu, K., Zheng, H., Zhang, H., Cohen-Or, D., Liu, L., and Xiong, Y.2011. Photo-inspired model-driven 3d object modeling. In ACMTransactions on Graphics (TOG), vol. 30, ACM, 80.

[31] Xu, K., Zhang, H., Cohen-Or, D., and Chen, B. 2012. Fit anddiverse: Set evolution for inspiring 3d shape galleries. ACMTransactions on Graphics (TOG) 31, 4,57.

[32] Xue, T., Liu, J., and Tang, X. 2010. Object cut: Complex 3d objectreconstruction through line drawing separation. In Computer Vision andPattern Recognition (CVPR), 2010 IEEE Conference on, IEEE, 1149-1156.

[33] Yoon, S., and Kim, M. 2006. Sweep-based freeform deformations. InComputer Graphics Forum, vol. 25, Wiley Online Library, 487-496.

[34] Zeleznik, R., Herndon, K., and Hughes, J. 2007. Sketch: aninterface for sketching 3d scenes. In ACM SIGGRAPH 2007 courses, ACM,19.

[35] Zheng, Y., Fu, H., Cohen-Or, D., Au, O., and Tai, C. 2011.Component-wise controllers for structure-preserving shape manipulation.In Computer Graphics Forum, vol. 30, Wiley Online Library, 563-572.

[36] Zheng, Y., Chen, X., Cheng, M., Zhou, K., Hu, S., and Mitra, N.2012. Interactive images: cuboid proxies for smart image manipulation.ACM Transactions on Graphics (TOG) 31, 4, 99.

[37] Zhou, S., Fu, H., Liu, L., Cohen-Or, D., and Han, X. 2010.Parametric reshaping of human bodies in images. ACM Transactions onGraphics (TOG) 29, 4, 126.

What is claimed is:
 1. A method of obtaining a three-dimensional digitalmodel of an artificial object, made up of a plurality of geometricprimitives, the artificial object being in a single two-dimensionalphotograph or drawing, the method comprising: defining a two-dimensionaloutline of said artificial object within the photograph; interactivelyallowing a user to define cross-sectional profiles of successive ones ofsaid geometric primitives, said cross-sectional profiles defining athird dimension; interactively allowing a user to provide sweep input tosweep respective defined cross-sectional profiles over an extent of acorresponding one of said geometric primitives within the image, saidsweeping generating successive three-dimensional model primitives fromexisting detected edges of said corresponding geometric primitives andsaid sweeping of said respective profile; and aligning said plurality ofthree-dimensional model primitives to form said three-dimensional model.2. The method of claim 1, comprising interactively allowing said user toexplicitly define three dimensions of the geometric primitive usingthree sweep motions, wherein a first two of said three sweeps define afirst and second dimension of said cross-sectional profile and a thirdsweep defines a main axis of the geometric primitive.
 3. The method ofclaim 1, comprising, upon the user sweeping the two-dimensional profileover a respective one of said geometric primitives, dynamicallyadjusting said two-dimensional profile using a pictorial context on thephotograph and automatically snapping photograph lines to said profile.4. The method of claim 3, wherein said snapping allows saidthree-dimensional model to include three-dimensional primitives thatadhere to the object in the photographs, while maintaining globalconstraints between said plurality of three-dimensional model primitivescomposing said object.
 5. The method of claim 4, further comprisingoptimizing said global constraints while taking into account saidsnapping and said sweep input.
 6. The method of claim 4, furthercomprising a post snapping fit improvement of better fitting theprimitive to the image, said better fitting comprising searching fortransformations within ±10% of primitive size, that create a better fitof the primitive's projection to said profile.
 7. The method of claim 1,wherein said defining said two dimensional outline comprises edgedetecting.
 8. The method of claim 1, further comprising estimating afield of view angle from which said photograph was taken in order toestimate and compensate for distortion of said primitives within saidphotograph.
 9. The method of claim 1, further comprising usingrelationships between said primitives in order to define globalconstraints for said object.
 10. The method of claim 9, furthercomprising obtaining geo-semantic relations between said primitives todefine said three-dimensional digital model, and encoding said relationsas part of said model.
 11. The method of claim 1, further comprisinginserting said three-dimensional digital model into a second photograph.12. The method of claim 1, further comprising extracting a texture fromsaid photograph and applying said texture to sides of saidthree-dimensional model not visible in said photograph.
 13. The methodof claim 1, wherein said defining said cross-sectional profilescomprises defining a shape and then distorting said shape to correspondto a three-dimensional orientation angle.
 14. The method of claim 4,comprising applying different constraints to different partsrespectively of a given one of said geometric primitives, or locallymodifying different parts respectively of a given one of said geometricprimitives.
 15. The method of claim 2, comprising snapping said firsttwo user sweep motions to said photograph lines, using the endpoints ofsaid first two user sweep motions along with an anchor point on arespective primitive to create three-dimensional orthogonal system for arespective primitive.
 16. The method of claim 1, further comprisingsupporting a constraint, said constraint being one member of the groupconsisting of: parallelism, orthogonality, collinear axis endpoints,overlapping axis endpoints, coplanar axis endpoints and coplanar axes,and for said member testing whether a pair of components is close tosatisfying said member, and if said member is satisfied or close tosatisfied then adding said constraint to a respective one of saidprimitives.
 17. The method of claim 1, wherein said aligning said threedimensional primitives comprises finding an initial position for allprimitives together by changing only their depth to adhere togeo-semantic constraints, followed by modifying shapes shape of theprimitives.
 18. A user interface for carrying out the method of claim 1,the user interface comprising an outline view of a current photograph onwhich view to carry out interactive sweeping to define cross sections ofrespective primitives and on which to snap said cross-sections.
 19. Theuser interface of claim 18, further comprising a solid model view and atexture view respectively of said current photograph, and selectabilityfor user selection between different basic cross-sectional shapes.
 20. Amethod of digitally forming a three-dimensional geometric primitive froma two-dimensional geometric primitive from a two-dimensional photographor drawing, comprising: interactively obtaining user input to draw atwo-dimensional cross section of the primitive and then using furtheruser input to sweep the cross-section over a length of the primitive.21. A method of forming a derivation of a photograph or drawing, thephotograph incorporating a two dimensional representation of athree-dimensional object, said three-dimensional object comprisinggeometric primitives, the two-dimensional representation being arotation or other transformation of an original two-dimensionalrepresentation, the rotation being formed by: carrying out the method ofclaim 1 to form a three-dimensional model of said originaltwo-dimensional representation; rotating or otherwise transforming saidthree-dimensional model; and projecting said rotated or otherwisetransformed three-dimensional model onto a two-dimensional surface toform said derivation.